Generalized Second-Order Value Iteration in Markov Decision Processes

نویسندگان

چکیده

Value iteration is a fixed point technique utilized to obtain the optimal value function and policy in discounted reward Markov decision process (MDP). Here, contraction operator constructed applied repeatedly arrive at solution. first-order method and, therefore, it may take large number of iterations converge Successive relaxation popular that can be solve equation. It has been shown literature under special structure MDP, successive overrelaxation computes faster than standard iteration. In this article, we propose second-order procedure obtained by applying Newton–Raphson scheme. We prove global convergence our algorithm solution asymptotically show convergence. Through experiments, demonstrate effectiveness proposed approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Interactive Value Iteration for Markov Decision Processes with Unknown Rewards

To tackle the potentially hard task of defining the reward function in a Markov Decision Process, we propose a new approach, based on Value Iteration, which interweaves the elicitation and optimization phases. We assume that rewards whose numeric values are unknown can only be ordered, and that a tutor is present to help comparing sequences of rewards. We first show how the set of possible rewa...

متن کامل

Fast Value Iteration for Goal-Directed Markov Decision Processes

P lanning problems where effects of actions are non-deterministic can be modeled a8 Markov decision processes. Planning prob lems are usually goal-directed. This paper proposes several techniques for exploiting the goal-directedness to accelerate value itera tion, a standard algorithm for solving Markov decision processes. Empirical studies have shown that the techniques can bring about signi...

متن کامل

Approximate Value Iteration for Risk-aware Markov Decision Processes

We consider large-scale Markov decision processes (MDPs) with a risk measure of variability in cost, under the risk-aware MDPs paradigm. Previous studies showed that risk-aware MDPs, based on a minimax approach to handling the risk measure, can be solved using dynamic programming for small to medium sized problems. However, due to the “curse of dimensionality”, MDPs that model real-life problem...

متن کامل

Topological Value Iteration Algorithm for Markov Decision Processes

Value Iteration is an inefficient algorithm for Markov decision processes (MDPs) because it puts the majority of its effort into backing up the entire state space, which turns out to be unnecessary in many cases. In order to overcome this problem, many approaches have been proposed. Among them, LAO*, LRTDP and HDP are state-of-theart ones. All of these use reachability analysis and heuristics t...

متن کامل

Generalized Semi-Markov Processes: Antimatroid Structure and Second-Order Properties

A generalized semi-Markou scheme models the structure of a discrete event system, such as a network of queues. By studying combinatorial and geometric representations of schemes we find conditions for second-order properties-convexity/concavity, sub/supermodularity-of their event epochs and event counting processes. A scheme generates a language of feasible strings of events. We show that monot...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Automatic Control

سال: 2022

ISSN: ['0018-9286', '1558-2523', '2334-3303']

DOI: https://doi.org/10.1109/tac.2021.3112851